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Abstract 

In this paper, we consider the problem of finding the Least Squares estimators of two isotonic 
regression curves g° and under the additional constraint that they are ordered; e.g., g° < §2- Given 
two sets of n data points yi , . . . , ?/„ and zi , . . . , z„ observed at (the same) design points, the esti- 
mates of the true curves are obtained by minimizing the weighted Least Squares criterion L2{a, b) = 
Sj=i(2/j over the class of pairs of vectors (a, 6) G M" x M" such 

that ai < 02 < . . . < a„, hi < b2 < ■ ■ ■ < bn, and a; < bi,i = 1, . . . , n. The characterization of 
the estimators is established. To compute these estimators, we use an iterative projected subgradient 
algorithm, where the projection is performed with a "generalized" pool-adjacent-violaters algorithm 
(PAVA), a byproduct of this work. Then, we apply the estimation method to real data from mechanical 
engineering. 



Keywords: least squares; monotone regression; pool-adjacent-violaters algorithm; shape constraint esti- 
mation; subgradient algorithm 
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1 Introduction and motivation 



Estimating a mon otone regressi on curve is one of the most classical estimation problems under shape re- 
strictions, see e.g. iBrunM (119581) . A regression curve is said to be isotonic if it is monotone nondecreasing. 
We chose in this paper to look at the class of isotonic regression functions. The simple transformation 
g — > —g suffices for the results of this paper to carry over to the antitonic class. 

Given n fixed points xi, . . . , Xn, assume that we observe yi at Xj for z = 1, . . . , n. When the points 
{xi,yi) are joined, the shape of the obtained graph can hint at the increasing monotonicity of the true 
regression curve, g° say, assuming the model yi = g°{xi) + ej, with the unobserved errors. This shape 
restriction can also be a feature of the scientific proble m at hand, and hence the need for estimat i ng the 



Barlow et al. 



(Il972h and 



Robertson et al. 



mm 



true curve in the class of antitonic functions. We refer to 
for examples. The weighted Least Squares estimate of g° in the class of isotonic functions taking yi at x 
is the unique minimizer of the criterion 



L{a) 



^ Wi{yi 



1=1 



(1) 



over the class of vectors a G such that ai < 02 . . . < a„ where wi > 0,W2 > 0, . . . ,Wn > are 
given positive weights. In what follows, we will say that a vector v G M" is increasing or isotonic if 
vi < . . . < Vn, and use the notation v < w for v,w if the inequality holds componentwise. 
It is well known that the solution a* of the Least Squares problem in ([T]) is given by the so-called min-max 
formula; i.e.. 



max mill Av{{s, . . . , t}) 

s<i t>i 



(2) 



where Av 



van Eeden 



' s... t}) = X;-=s yi^i/ Yli=s '^i (see e.g. 



Barlow et al. 



1972 ). 



(Il957a|]bh has generalized this problem to incorporate known bounds on the regression function 



to estimate; i.e., she considered minimization of L under the constraint 



CLL 1^ 0. < au, 



(3) 



for two increasing vectors and aij. As in the classical setting, the solution of this problem admits 
also a min-max representation. The PAVA can be ge neraUzed to efficiently c ompute this solution and 
has been implemented in the R package OrdMonReg (IBalabdaoui et al.U2009h . Computation relies on a 
suitable functional M defined on the sets yl C {1, . . . , n} which generalizes the function Av in This 
functional for the bounded monotone regression in Q is given by 

M{A) = (^Av{A) VmaxoL^ Aminau 



Barlow et al. 



(Il972l. p. 57), where a 



where min^ v = miiijgyi Vi and max^^ v = maxjg^ Vi. Compare 
functional notation is used. However, in the latter reference no formal justification was given for the form 
of the functional M nor for the validity of (the modified version of) the PAVA, see the discussion after 
Theorem 12. II 
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Chakravartil (Il989h discusses the bounded isotonic regression problem for the absolute value criterion 
function, yielding the bounded isotonic median regressor. He proposes a PAVA-like algorithm as well, and 
establishes some c onnections to linear programmin g theory. Unbounded isotonic median regression was 



first considered by [Robertson and WaltmanI (|1968r) . who provided a min-max formula for the estimator 
and a PAVA-like algorithm to compute it. They also studied its consistency. 

Now suppose that instead of having only one set of observations yi , . . . , yn at the design points xi, . . . ,Xn, 

we are interested in analyzing two sets of data yi , . . . , ?/„ and zi, . . . ,Zn observed at the same design 

points. Furthermore, if we have the information that the underlying true regression curves are increasing 

and ordered, it is natural to try to construct estimators that fulfill the same constraints. 

The cuiTcnt paper presents a solution to this problem of estimating two isotonic regression curves under 

the additional constraint that they are ordered. This solution is the unique minimizer (a* ,b*) of the 

criterion 

n n 

L2{a,b) = ^wi^i{yi - aif + ^W2,i{zi - bif (4) 

i=l 1=1 

over the class of pairs of vectors (a, 6) G M" x M" such that a and b are increasing and a < b, with wi 
and W2 given vectors of positive weights in W^. 

The problem was motivated by an application from me chanical engineering. W e will make use of experi- 



mental data obtained from dynamic material tests (see iShim and Mohri. 120091) to illustrate our estimation 
method. In engineering mechanics, it is common practice to determine the deformation resistance and 
strength of materials from uniaxial compression tests at different loading velocities. The experimental 
results are the so-called stress-strain curves (see Figure [T|), and these may be used to determine the de- 
formation resistance as a function of the applied deformation. The recorded signals contain substantial 
noise which is mostly due to variations in the loading velocity and electrical noise in the data acquisition 
system. 

The data in this example consist of 1495 distinct pairs (xj, y-i) and (xj, Zi) where Xi is the measured strain, 
while Ui (gray curve) and Zi (black curve) correspond to the experimental stress results for two different 
loading velocities. The true regression curves are expected to be (a) monotone increasing as the stress is 
known to be an increasing function of the strain (for a given constant loading velocity), and (b) ordered 
as the deformation resistance typically increases as the loading velocity increases. In Section [3l we show 
the resulting estimates as well as a smoothed version thereof. 

We will show that minimizing L2 is equivalent to minimizing another convex functional over the class of 
isotonic vectors a G M". By doing so, we reduce a two-curve problem under the constraints of monotonic- 
ity and ordering to a one-curve problem under the constraint of monotonicity and boundedness. Actually, 
we can even perform the minimization over the class of isotonic vectors (ai, . . . ,an-i) of dimension 
n — 1 satisfying the constraint oi < ... < On-i < o-n we can explicitly determine a* by a gen- 
eralized min-max formula (see Proposition 12.31 ). The solution of this equivalent minimization problem, 
which gives the solution a* (and also b* because it is a function of a*), is computed using a projected 
subgradient algorithm where the projection step is performed using a suitable generalization of the PAVA. 
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Alternatively, the solution can be computed using Dykstra's algorithm (lDykstral.ll983[) . This point will be 
further discussed in Section [3 



We would like to note that 



Brunk et al 



(119661) considered a related problem, that of nonparametric Max- 
imum likelihoodestimation of two ordered cumulative distribution functions. In the same class of prob- 
lems, iDykstral (119821) treated estimation of survival functio ns of two stochast i cally o rdered random vari- 
ables in the presence of censoring, which was extended by iFeltz and Dykstral (Il985h to > 2 stochasti- 
cally ordered random variables. The theoretical solution can be related to the well-kn own Kaplan-Meier 



estim ator and can be computed using an iterative algorithmic procedure for > 3 (see lFeltz and Dykstra , 



1985 



, p. 1016). Th e ^yn— asymptotics of the est imators for N = 2, whether there is censoring or not, 
were established by IPreestgaard and HuangI (119961) . 

The paper is organized as follows. In Section [U we give the characterization of the ordered isotonic 
estimates. We also provide the explicit form of the solution of the related bounded isotonic regression 
problem where the upper of the two isotonic curves is assumed to be fully known. 
In Section[3]we describe the projected subgradient algorithm that we use to compute the Least S quares es- 
timat ors of the ordered isotonic regression curves, discuss the connection to Dykstra's algorithm (IDykstra , 
1983h . and apply the method to real data from mechanical engineering. The technical proofs are deferred 
to appendices |A] and |B] 
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2 Estimation of two ordered isotonic regression curves 



If the larger of the two isotonic curves was known, then there would of course be no need to estimate it. If 
we put ajj = aP, the weighted Least Squares estimate a* of the smaller isotonic curve is the minimizer of 

n 

i=l 

where w G M" is a vector of given positive weights, and a S , the class of isotonic vectors a G 
such that a < aP and a*^ G M". When the components of are all equal, the vector oP will be assimilated 
with the common value of its components as done in Proposition [33]below. 

The notation will be used again hereafter to denote the class of isotonic vectors e M" such that 

V <w. 

The statement of ] 



Barlow et al. 



(119721. p. 57) implies that if we define 



M{A) = Av{A) A niina° 

for a subset A C {1, . . . then the solution a* can be computed using an appropriately modified 
version of the PAVA. 

Theorem 2.1. For i = 1, . . . ,n, we have 

a* = maxminM({s, . . . , t}) = max min ( Av{{s, . . . ,t}) Aa[ 

s<i t>i s<i t>i V 

To keep this paper at a reasonable length, the proof of Theorem 12. II is omitted. A short note containing a 
more thorough discussion of the one-curve problem and a proof of Theorem l2.1l can be obtained from the 
authors upon request. A general description of the modified PAVA and a proof that it works whenever the 
functional M satisfies the so-called Averaging Property can be found in Section [3] 
We now return to the main subject of this paper. Theorem 12. 11 is crucial for finding the Least Squares 
estimates of two ordered isotonic regression curves. In particular, the result will be used to develop an 
appropriate algorithm to compute the solution. 

Let yi , . . . , y„ and zi , . . . , z„ be the observed data from two unknown isotonic curves g° and ^2 such that 
9i < 92- Given two vectors in of positive weights wi and W2, we would like to minimize (01) over the 
class of pairs of vectors (a, h) G x R" such that a and h are isotonic and a < b. Call this class X„. 

Existence and uniqueness of the solution. They follow from convexity and closedness of X„ and strict 
convexity of L2. 

Characterization of the solution. For completeness, we give the characterization of the solution of 
minimizing ^ over X„; i.e, a necessary and sufficient condition for (a, b) G X„ to be equal to this 
solution. Let ii < . . . < such that ii = l,ik = n and 

ai = ■■■ = a*h< a*h+i = ■■■ = a*^-i < • • • < a*^ = . . . = a^. 
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We call B^. (resp. Bf.) a set of indices {ij, . . . , ij+i — 1}, j = 1, . . . ,k — 1 such that aj\ = 6* (resp. 
a* < b*,). Similarly, let h <...< Ir such that h = 1, = n such that 



6t = . . . = < +1 = . . . = 6,; -1 < . . . < = . • • = 

and call C^, (resp. C^^ ) a set of indices {Ij, . . . , Ij^i — 1}, j = 1, . . . , r — 1 such that ftj* = a^* (resp. 
Theorem 2.2. T/ie pa/r (a*, b*) S X„ w f/ie minimizer of ([4]) if and only if 

n n 

X^K* - yi)ici'i - ai)wi^i + ^(^r - Zi){b* - bi)w2,i > 0, V (a, b) £ J„ (5) 

i=l 

X] ~ = 0, and (6) 



1=1 



^ (6:-z,)6>2,. = 0. (7) 



Proof. See Appendix [A] 

An explicit formula in the sense of a min-max representation similar to ^ of (a* , 6* ) turned out be to 
hard to find. However, since a* (resp. b*) is also the minimizer of 



^(a - Vifwi^i f resp. ^{b - Zifw2j 



i=l i=l 

over the class (resp. the class of isotonic vectors 6 G R" such that b> a*), Theorem [XT] implies that 

a* = maxmin {Avi{{s, . . . ,t}) Abl) (8) 

s<i t>i 

b* = maxmin {Av2{{s, . . . , t}) V 4) (9) 

s<i t>i 

for z = 1, . . . , n, where 
foT A C {!,... ,n}. 

Thus, the solution (a*, 6*) is a fixed point of the operator P : Tn —>■ In defined as 

P{{a,b)) = {Pi{b),P2ia)) (10) 
= I maxmin {Avi{{s, . . . , t}) A 6^)) maxmin {Av2{{s, . . . , t}) V at] 

y s<i t>i s<i t>i 

However, this fixed point problem does not admit a unique solution. Therefore, there is no guarantee 
that an algorithm based on the above min-max formulas yields the solution, except in the unrealistic and 
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uninteresting case where the starting point of the algorithm is the solution itself. To see that P does not 
admit a unique fixed point, note that the minimizer of the criterion 

n n 
i=l i=l 

is a fixed point of P for any B > Q. Therefore, a computational method based on starting from an initial 
candidate and then alternating between ^ and ^ cannot be successful. In parallel, we have invested a 
substantial effort in trying to get a closed form for the estimators. Although we did not succeed, we were 
able to obtain a closed form for a\ (and by symmetry for 6* ). 

Proposition 2.3. We have that 

a\ = imi\Avi{{l, . . . A^min^M({l, . . . . . ,t'}) 

where 

M{A,B) = = -^^ . 

By symmetry, we also have that 

6* = max Af2({i, • • • , ''^}) V may. M[{t' ^n},{t, ... ,n}). (11) 

t<n t<t'<n 

Some remarks are in order. The expressions obtained above indicate that the Least Squares estimator 
must depend, as expected, on the relative ratio of the weights wi and W2- In particulars if tt;2 = (resp. 
wi = 0), the expression of a* (resp. 6* ) specializes to the well-known min-max formula in the classical 
Least Squares estimation of an (unbounded) isotonic curve. The expression of 6* is essential for our 
subgradient algorithm below. 

Proof of Proposition \2.3\ See Appendix lAl 

In the next section, we describe how we can make use of the min-max formula in ^ to compute the 
estimators using a projected subgradient algorithm. As mentioned above, we use in this algorithm the 
identity (fTT]) given in the previous proposition. 



3 Algorithms and Application to real data 

In this section, we show that the bounded isotonic estimator can be computed using the well-known PAVA, 
or to be more precise a modified version of it. Recall that the bounded isotonic estimator in the one-curve 
problem is given by 

a* = max min M({s, t}) 

s<i t>i 
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where M{A) = Av{A) V max^ for any ^ C {1, . . . , n}. That a* can be computed using a PAVA is 
a consequence of a more general result. Namely, that a functional M of sets ^ C { 1 , . . . , n} satisfies 
what is referred to as the Averaging Property , (see lChakravartil.ll989l. p. 138), al so called Cauchy Mean 



Value Property by iLeurgansI (|l98ll. Section 1). See also 



Robertson et al. 



d 19881. p. 390). Note that in 



the classical unconstrained monotone regression problem, the min-max expression of the Least Squares 
estimator follows from Theorem 2.8 in 



Barlow et al. 



mi± p. 80). 



3.1 Getting the min-max solution by the PAVA 

First, let us describe how the PAVA works for some set functional M. 

• At every step the current configuration is given by a subdivision of {1, . . . , n} into k subsets 5i = 
{1, . . . , ii}, 5*2 = {n + 1, . . . , 12], Sk = {ik-i + 1, . . . , n} for some indices 1 = io < h < 
12 < ■■■ < ik-i <ik = n. 

• The initial configuration is given by the finest subdivision; i.e., Ij = {j}. 

• At every step we look at the values of M on the sets of the subdivision. A violation is noted each 
time there exists a value j such that M{Sj) > M{Sj^i). We consider the first violation (the one 
corresponding to the smallest j) and then merge the subsets Sj and Sj+i into one interval. 

• Given a new subdivision (which has one subset less than the previous one), we look for possible 
violations. 

• The algorithm stops when there are no violations left. 

Since for any violation a merging is performed (thus reducing the number of subsets), it is clear that the 
algorithm stops after a finite number of iterations. 



We require now the se t functional M to satisfy the fo llowing property. See ILeurgansI (Il98ll. Section 1), 



Robertson et al. 



(Il988l. p. 390) and lChakravartil d 19891 . p. 138). 



Definition 3.1. We say that the functional M satisfies the Averaging Property if for any sets A and B 
such that An B = f}> we have that 

m.\n{M{A),M{B)} < M{AU B) < max{M (A) , M (B)} . 



If h and it; > are given vectors G M", then beside 



Ah^Av{A) = y^^Wjhj/ y^^wj, 
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the following examples of functions also satisfy the Averaging Property : 



A 



A ^ 



I — > 



\^Av{A) V max hjj A nun with two vectors G MJ^, 

min h = min hi , 

A ieA 



A 




where the arg min is taken to be the smallest m in case non-uniqueness occurs 



A 



I — > 



max h = max hi. 

A i£A 



Note that the maximum, the minimum and the sum of two functional satisfying the Averaging Property 
satisfy the same property as well. 

Theorem 3.2. The final configuration obtained by the PAVA is such that the two following properties are 
satisfied. 

1. The functional M is increasing on the sets of the subdivision. 

2. If one of the sets Sj = C U D is the disjoint union of two subsets C = {ij-i + 1, . . . , A;} and 
D = {A; + 1, . . . ,ij}, then M[C) > M[D); i.e., a finer subdivision would necessarily cause a 



Proof. The fact that M is increasing on the final configuration is an easy consequence of the absence of 
violations (otherwise the algorithm would not have stopped). 

As for the second part of the property, note that this is satisfied by the initial configuration (since no set is 
the disjoint union of two non-trivial subsets), as well as by any configuration that one could obtain after 
the first merging (since a merging occurs only because of a violation). Now we will use an inductive 
reasoning. 

To this end, we have to check two situations: Suppose we merge two subsequent sets A and B and want to 
check whether there is a violation on C and D, with A^ B = C ^ D. We are in one of the two following 
cases: either A = AiiJ A2, C = Ai and D = A2^ B, or: B = Bi\J B2, C = A[J Bi and D = B2 (the 
case C = A and D = B is trivial). 
In the first case, if we suppose M {D) > M{C), we get 

M{A2 UB)> M{Ai), M{A2) < M{Ai), M{B) < M{A) = M{Ai U A2), 

(the first inequality follows by assumption, the second by induction, and the third is true since A and B 
have been merged) and this is impossible since one would conclude that 



and hence M{A) > M{B) > M{Ai) > M(A2), which implies M( A) > max{M{Ai), M{A2)}, 
which contradicts the Averaging Property . 



violation. 



max{M(^2),M(S)} > M{Ai) > M{A2) 
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In the second case we would have 



M(A U Bi) < M{B2), M{B2) < M{Bi), M(A) > M{B) = M{Bi U B2), 
which implies 

mm{M{A),M{Bi)} < M{B2) < M{Bi), 

and then mm{M{A), M{Bi)} = Ad{A) and M{A) < M{B2) < M{Bi), which contradicts either 
M{A) < M (B) or the Averaging Property . □ 

Theorem 3.3. If {Sj)j is the partition obtained at the end of the PAVA described above, then rrii = 
M{Sj.) such that i G Sj. takes the same values given by the min-max formula for the index i. 

Proof. See Appendix |Al 

3.2 Shor's projected subgradient and Dykstra's iterative cyclic projection algorithm 

The minimization problem considered in this paper can be easily recognized as a projection problem onto 
the intersection of the three following closed convex cones in R" x 

{(a, 6) : a is increasing}, {(a, 6) : 6 is increasing}, and {(a, 6) : a < 6}. 

Projections onto the first two cones can be computed by PAVA, and onto the last one by replacing 
the components of each pair {ai,bi) violating the constraint (i.e. Oj > 6j) by the weighted avera ge 



{wi^iUi + W2,ibi)/{wi^i + W2,i) of Oj and 6j. Implementation of Dykstra's algorithm (IDykstral. Il983r) is 
then straightforward. 

Yet, our algorithm has preferable features as we will now explain. The algorithm developped by Dykstr a 



is well-suited for projections onto intersections of convex sets or half-spaces (see lBregman et al.L 120031) . 
while the algorithm we propose can handle a larger class of minimization problems which involve the 
set of isotonic vectors, and are not necessarily projections. For instance, simple modifications of our 
algorithm would allow us to minimize any objective function of the form 

n 

(a, b) 1-^ F{a, wi) + ^ W2,i{zi - bif 

i=l 

under the same constraints on a and b, where F is any convex and differentiable function. The second 
quadratic term can be also replaced by a different penalization term depending e.g. on an Lp-distance. In- 
deed, it suffices to modify the computations involved in the PAVA by adapting them to various functional 
satisfying the Averaging Property (see Section [3]|. 

Our algorithm is easy to understand and is only based on a classical gradient method. Once the minimiza- 
tion is performed with respect to one of the variables, the objective function with respect to the remaining 
variable is still explicit, but no more differentiable. This is the main reason for which the algorithm is 
actually a subgradient descent. We believe that the explicit nature of the computations in our subgradient 
algorithm are exactly the key feature for the possibility of understanding and/or modifying it. 



10 



However, we would like to point out the merits of Dykstra's algorithm in this specific setting. Since it is 
tailored for a Least Squares problem, and because only three very simple projection cones are involved, 
Dykstra's algorithm (see below for details) computes the minimum of the criterion L2 given in (01) faster 
than the subgradient algor it hm, although Dykstra's algorithm is typically considered to be rather slow 



(see e.g. iMammenl. 



1991a or 



Birke and Dette. 



2007 ). No te that the choice of the stopping criterion in this 



algorithm may be delicate, see lBirgin and Raydad (|2005h . However, this was not an issue in our setting. 



3.3 Preparing for a projected subgradient algorithm 

The following proposition is crucial for computing the ordered isotonic estimators via a projected subgra- 
dient algorithm. 

Proposition 3.4. Let ^' be the criterion 

. . . = ^(^max{Gs,i /\bs) - Vi^ wi^i + '^{bi - Zifw2,i (12) 



i=l 



i=l 



which is to be minimized on the convex set 



= {(61, . . . , hn-l) e M"-^ : 61 < 62 < . . . < bn-l < b*J 



where 



Gs,i = inmAvi{{s, . . . ,t}) and bn = b*^ in ([T2 



t>i 



The criterion ^ is convex. Furthermore, its unique minimizer . . . , 6**,]^) equals (b* 



' • • • ' "n~lJ- 



Proof. Let us write 



T = I^ = {a= (ai,...,a„) G M" : ai < . . . < a„,}, 



J: = {ft = (61, . . . , 6„) : (61, . . . , bn-i) G I'Ju and bn = b^} 



and consider 



= {a : a G X and a < b} 



for b £ I*. 

Now note that the min-max formula in (HI) allows us to write 



n-l 



( max{Gsj A bs) - yA wij + ^{bj - Zjfw2,j 



n-l 



i=i 
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Hence, we have for b £ T* 



n-l 



^{bi, . . . ,bn-i) = mm^{aj - yjfwij + '^{bj - Zjfw2,j 



n-l 



= ^i^j{b) - Vjfwij + ^{bj - Zjfw2,j 
i=i 3=1 

where aj{b) = maXs<j(Gsj A bg) is the j-th component of the minimizer of the function X]"=i(aj 
yj)'^wij in T^j. Let A G [0, 1], and b and b' in X*. By definition of and X^', we have that 



A d{b) + (1 - A) a{b') < A 6 + (1 - A) 6' 



and hence 



2 

^(a,(A6 + (l-A)6')-%) 



2 



< (a a(6) + (1 - A) a(6') - yA wi,j 



i=i 

n 



2 



<A^ (ai(^)-yij u;ij + (l-A)^ \o-j{b') - Vjj wij. 

This shows convexity of the first term of Convexity of ^ now follows from convexity of the function 
J]j=i — Zj)'^W2j and the fact that the sum of two convex functions defined on the same domain is also 
convex. □ 
The idea behind considering the convex functional ^ is to reduce the dimensionality of the problem as 
well as the number of constraints (from 3n — 2 to n — 1 constraints). Once ^' is minimized; i.e, the 
isotonic estimate b* is computed, a* can be obtained using the min-max formula given in ([8]l. However, 
the convex functional ^ is not continuously differentiable, hence the need for an optimization algorithm 
that uses the subgradient instead of the gradient as the latter is not defined everywhere. 

3.4 A projected subgradient algorithm to compute &i, . . . , 

To minimize the non-smooth convex function ^' we use a projected subgradient algorithm. Since the 
gradient does not exist on the entire domain of the function, one has to resort to computation of a sub- 
gradient, the analogue of the gradient at points where the latter does not exist. As opposed to classical 
methods developed for minimizing smooth functions, the procedure of searching for the direction of de- 
scent a nd steplengths is entirely different. The classical reference for subgradient algorithms is 



Shor 



(119851) . iBoyd et all (120031) provide a nice summary of the topic, including the projected variant. Note that 



a recent application in statistics of the subgradient al gorithms gives no w the possibility to compute the 



log-concave density estimator in high dimensions; see lCule et all (120081) . 
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The main steps of the algorithm. Now recall that the functional ^ should be minimized over the 
(n — 1)— dimensional convex set given in Proposition |33] Of course, this is the same as minimizing 
^ over the n— dimensional convex set {(61, . . . , bn) | &i < • • • < bn~i}, starting with an initial vector 
{bf'\ . . . , bn^) such that bn^ = 6* and constraining the n— th component of the sub-gradient of ^ to be 
equal to 0. 

Given a steplength r^, the new iterate 
is given by 



at the fc— th iteration of a subgradient algorithm 



hk - TkDj, 



where Dk is the subgradient calculated at the previous iterate; i.e., = V^{vk) (see Appendix IbT). 
However, it may happen that i;a;+i is not admissible; i.e. {bi^^, . . . , b^'^\) does not belong to ^^"-i- 
When this occurs, an L2 projection of this iterate onto is performed. This is equivalent to finding 
the minimizer of 



i=l 



,fc+l\2 



over the set Tn". The latter problem can be solved using the generalized PAVA for bounded isotonic 
regression as described above. 

The computation of the subgradient Dk is described in detail in Appendix |Bj As for the steplength r^, we 
start the algorithm with a constant steplength. Once a pre-specified number of iterations has been reached 
we switch to 



Tk+l 



0.1 1 



D 



k\\2) 



where 7^ := is such that 0<7fc^0asA:^oo and Yl'k^i 7fc = 00. Here, || • II2 denotes 

the L2-norm of a vector in M"^. This combination of constant and non-summable diminishing steplength 
showed a good performance in our implementation of the algorithm over other classical choices of (7^)^- 
Furthermore, convergence is ensured by the following theorem. 



Theorem 3.5. iBovd et al\ ((200% ) A subgradient algorithm complemented with least-square projection 
and using non-summable diminishing steplength yields for any r] > after k = k{rj) iterations a vector 
b^ := {bi, ... ,b^) such that 



min ^'(6*) - ^'(6*) < r], 

i=l,...,k 

where 6* = . . . , 6* ) is the vector given in Proposition \3.4\ 



The proof can be found in iBoyd et al.l (120031) by combining their arguments in Sections 2 and 3. Note 
that in our implementation we do not keep track of the iterate that yielded the minimal value of ^I*, since 
we apply a problem-motivated stopping criterion that guarantees us to have reached an iterate that is 
sufficiently close to b* = {b^, . . . , 6*). 
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Choice of stopping rule. Since in subgradient algorithms the convex target functional does not nec- 
essarily monotonically decrease with increasing number of iterations, the choice of a suitable stopping 
criterion is delicate. However, in our specific setting we use the fact that (a*, 6*) is a fixed point of the 
operator P defined in (flOl ) where a* = Pi{b*); the solution of ([T]) with upper bound b*. This motivates 
iterating the algorithm until the difference of entries of the two vectors b'^ and 6^ where 

6| = P2oPi(6'=) 

is below a pre-specified positive constant 6. 



The implementation. The Dykstra and the projected subgradient algorithms as well as the gener- 
alized PAVA for comp uting the solution in the one curv e problem under the constraints in ^ were 
all implemented in R (IR Development Core Teaml l2008h . The con^esponding package OrdMonReg 



Balabdaoui et al. 



(120091) is available on CRAN. Note that the data analyzed in Section 1331 is made avail- 
able as a dataset in OrdMonReg. 

To conclude this section on th e algorithmic aspects of our work, we would like to mention the work 
by iBeran and Diimbgenl (120091) who propose an active set algorithm which can be tailored to solve the 
proble m given in ^ for an arbitrary number of ordered monotone curves. However. iBeran and Diimbgen 
(l2009h do not provide an analysis of the structure of the estimated curves such as characterizations and 
rather put their emphasis on the algorithmic developments of the problem. 



3.5 Real data example from mechanical engineering 

We would like to estimate the stress-strain curves based on the available experimental data for two differ- 
ent velocity levels (see Figure [T]). The expected curves have to be isotonic and ordered. The data consist 
of 1495 pairs {xi,yi) and (x, , Zj). The values of the measured strain of the material (on the x-axis), are 
actually defined as (— ) the logarithm of the ratio of the current over the initial specimen length. The 
values are positive and take the maximal value 1, which corresponds to a maximum shortening of 63%. 
Furthermore, since the stress measurements for different velocities are not performed exactly at the same 
strain, the values of the stress have been interpolated at equally spaced values of the strain. As pointed out 
by a referee, this will induce correlation between the strain data. Even if the strain measurement were not 
interpolated, having correlated stress measurements is rather inevitable in this particula r application be- 



cause of the data processing procedures associated with the measurement technique (see lShim and Mohil . 



200% . The estimation method is however still applicable. When studying statistical properties of the iso- 
tonic estimators such as consistency and convergence, the correlation between the data should, of course, 
be taken into account. 

In such problems, practitioners usually fit parametric models using a trial and error approach in an attempt 
to capture monotonicity of the stress-strain curves as well as their ordering. The methods used are rather 
arbitrary and can also be time consuming, hence the need for an alternative estimation approach. Our 
main goal is to provide those practitioners with a rigorous way for estimating the ordered stress-strain 
curves. 
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In Figure |2] (upper plot) we provide the original data (black and gray dots) and the proposed ordered iso- 
tonic estimates a* and b* as described above. Being step functions, the estir nated is o tonic curve s are non- 
smoot h, a well known drawback of isotonic regression, see among others IWrighti (|l978r) and iMukerjee 
(|l988h . The latter author pioneered the combination of isotonization followed by kernel smoothing. A 
th orough asy r nptotic a nalysis of the smo othed isotonized and the isotonic smooth estimators was given 
by iMammenI (|l991bl '). iMukerjed (|l988l. p. 743) shows that monotonicity of the regression function is 
preserved by the smoothing operation if the used kernel is log-concave. Thus, we define our smoothed 
ordered monotone estimators by 



dlix) 



TA=lKh{x-Xi) 



and bl{x) 



TA=lKh{x-Xi) 



for < X < 1. For simplicity, we used the kernel Kh{x) = (p{x/h) where (p is the density function of 
a standard normal distribution which is clearly log-concave. Figure [2] (lower plot) depicts the smoothed 
isotonic estimates. We set the bandwidth to h = 0.1n~^/^ ?s 0.023. 

Motivated by estimation of stress-strain curves, an application from mechanical engineering, we consider 
in this paper weighted Least Squares estimators in the problem of estimating two ordered isotonic regres- 
sion curves. We provide characterizations of the solution and describe a projected subgradient algorithm 
which can be used to compute this solution. As a by-product, we show how an adaptation of the well- 
known PAVA can be used to compute min-max estimators for any set functional satisfying the Averaging 
Property. 
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A Proofs 

Proof of Theorem \2?2\ Suppose that (a*, 6*) is the solution. For e G (0, 1), and (a, h) G J„ consider the 
pair (a% 6^) G M'' x M" defined as 

= a* + e{a — a*) 
= b* + e{b-b*). 

For i < j G {1, . . . , n}, we have 

Oj — a\ = (1 — e)(a* — a*) + e{aj — aj) > 
b]-b\ = {l-e){b*-b*) + e{b,-bi)>0. 
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Also, for z G {1, . . . , n} we have 

= {l-e){a*-b*)+e{a,-bi)<0. 

Hence, (a*^, ft*^) G T„, and 

< lim-(L2(a^6^)-L2(a^f'*)) 

e\0 e 

n n 
j=l i=l 

yielding the inequality in Q. 

Now consider the vectors a'^ and b*^ such that for Z = 1, . . . , n 

af = «r + e < l^g^i 

= 

Let r < s G {1, . . . , n}. If r ^ S/^, and s ^ B}., then a| - = a* - a* > 0. If r G Bj. and s ^ 
then a* > a* and al — = a* — a* + ea* > for |e| small enough. The same reasoning applies if 
r ^ Bj, and s G Bj, . Finally, if r, s G Bj, , then - = 0. 

Now, for r G {1, . . . ,n}, we have a,^ = a* < b* if r ^ Bj^. Otherwise, = a*(l + e) < 6* if |e| is 
small enough. Hence, (a*^, b"^) G Xn> and 

= \iui-(L2ia\¥) - L2{a\b*)) 

e\0 e 



— 1 

r=l 

Summing up over all the sets Bj^ yields the identity in Q. We can prove very similarly the identity in (|7]l. 
Conversely, suppose that (a*, 6*) G Xn satisfies the inequality in For any (a, b) G Xn, we have 



^ n 1 " 

L2{a,b)-L2{a*,b*) = -J2{a^-a*fwi,^ + -^{bi-b*fw2, 

i=l i=l 
n 



i=l 
n 

+ ^(b* - Z,){bi - b*)w2, 
1=1 

> 0. 



We conclude that (a* , 6* ) is the solution of the minimization problem. □ 
Proof of Proposition \2.3\ Let e > and consider (a, 6) G M" x M" such that 



Oj = < - e liG{i,...,i}> * e • • >™} 
bi = b* 
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for z = 1, . . . , n. For small e, (a, b) G X„. Using the characterization in Theorem 12.21 it follows that 



implying that 



or equivalently 



^{a*-yj)wij < 



^{al - yj)wij < 0, for t G {1, . . . , n} 



o* < min Avidl, . . . , t}). 



Now, consider (a, 5) G R" x R" such that 

= - elje{i,. ..,<}'*£{!'••• ''^I 
= ~ eljG{i,.. .,*'}> 

for j = 1, . . . , n, with e > 0. For small e, we have that (a, b) G X2, and hence 

t t 
^(a* - 2/j>ij + ^{b* - Zj)w2,j < 0. 
j=i i=i 



It follows that 



that is 



We conclude that 



t t' 
^{al - yj)wij + ^{al - Zj)w2,j > 0, 
3=1 j=i 



at < min M({1, . . . , t}, {1, . . . , t'}). 

Kt' <t<n 



Oi < min^t;i({l,. . . , t}) A ^min^ M({1, . . . . . . 

Now if al < b^, let ii{l, . . . , n} be such that al = . . . = a*_^. Then (a, 6) is such that 

Oj = + <^ lie{i,...,n} 
5, = b* 

for j = 1, . . . , n is in T„ when |e| is small enough. It follows that 

Avi{{l,...,h}) = al. 
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If al = bl, and i[ and i'( are such that a* = . . . = a*, and b^ = . . . = b*„, then (a, 6) such that 

aj = a* + e 
bj = b* + e lje{i,...y^'} 
for j = 1, . . . , n is in T„ for |e| small enough. Hence, 

al=M{{l,...,i[},{l,---,i'l}). 

(note that i'( < Therefore, 

al =min^t;i({l,...,t}) A max M({1, . . . ,t},{l, . . . ,t'}). 

t>i t>t'>i 

The expression of bl follows easily by replacing respectively yi and Zi by —Zn-i+i and —yn-i+i for 
i = 1, . . . ,n. O 



Proof of Theorem \3.3\ Consider a G M" given by 

Oj = max min M({s, t}) 

s<i t>i 

and also the subdivision into subsets Sj = {ij^i + l, . . . obtained by the PAVA. Let us denote by G~ 
(resp. the grid set of indices which correspond to points at the beginning (resp. end) of those subsets; 
i.e. of the form ij + 1 (resp. ij). 
We obviously have 

Oj < max min M(is,...,t}). 

Then, consider s ^ G^. This means that we have a set {s, . . . ,t} of the form B U C, C being a union 
of subsets in the subdivision and B a right subset of a set of the partition of the form AU B. We want to 
prove that M{{s, ...,t})= M{B U C) is either smaller than M(C) or M{A UBUC). Suppose this is 
not the case. Then we would have 

M{B UC)> M{C), M{B UC)> M{A UBuC), M{A) > M{B), 

where the last inequality is implied by the second property in Theorem 13.21 Yet, the second inequality, 
together with the Averaging Property , implies that M{A) < M{B U C). In the end we get 

M{B UC)> M{C), M{B U C) > M{A) > M{B), 

which contradicts the Averaging Property . 

We conclude that M({s, . . . , t}) is smaller than the value of M at a set which is a union of sets of the 
subdivision; i.e. either AVJ B U C or C itself. But on sets of this kind it is obvious, by the Averaging 
Property , that M is smaller than the value nit, since this is the maximal value of M on the intervals 
composing such a set (this is a consequence of M being increasing). Hence, M({xs, . . . ,xt}) < rrit, 
implying that 

Oj < max min mt = rrii. 

s<i t>i,teG+ 

The opposite inequality is obtained exactly in a symmetric way (first take s G G~, then prove that 
M{{xs, . . . , xt}) is larger than the value of M on a union of intervals). □ 
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B Computing the subgradient 



Computing the subgradient of \I' on a dense set. Consider the set 



D = 



{6=(6i,...,6„„i)EM"-^:6,^6,Vi/j, 



and bii ^ Gsj' \/l<i'<n — l,l<s<n 



1, 1 < i' < n 



We denote by (e^, . . . , e"~^) the canonical basis of M"~^. The set is a dense open subset of M"~^ 
where the function ^ is differentiable. Actually, for a fixed 6 G D, in the explicit formula for ^ there is 
no ex-aequo (up to possible equalities between the Gi^s terms). The same will be true in a neighborhood 
of b. For each value of i G {1, . . . , n}, we define the function 



Let us first consider i G {1, . . . , n — 1}. We define {sj^ , . . . , Sj^. } to be the set of indices s where 
maXs<i(Gs^i A 6s) is attained. 

If k = 1, then Gsi^,i A 6^1 > Gs,i A bs for all s G {1, . . . , i} \ {sj^}. This implies that the same strict 
inequalities will be true in a neighborhood of b and hence there are two cases: either the function is locally 
constant or the square of an affine function. Hence, 



Now if A; > 2, then this implies that only Gg^ = 1, . . . ,k can be equal (by definition of the set D), 
and hence the function is locally constant. Therefore, V^'j(6) = 0. 

For i = n, the calculation also requires distinction between the cases k = 1 and k > 2. Thus, if A; = 1 
and the maximum maXs<n(Gs,n A bs) is attained at Sj^ 7^ n, then 



If A; = 1 and Si^ = n (in this case bn = b*^ is known) or A; > 2, then V^„(6) = 0. Now the gradient 
V*(6) is given by 




• If 5,,^ >G,,^,„thenV*i(6) = 0. 

• If bs^^ < Gs,^,i, then V*i(6) = 2(^(C^,, A 6,, J - y^) wi^i e'n. 



• If 6,,^ >G,,^,„,thenV^'i(6) = 0. 

• If 6,,^ < Gs,^,n, then V^'„(6) = 2(^(G^,^,„ A 6., J - Vn) wi,n 



n n—1 



i=i 1=1 
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Calculating the subgradient of ^ at any point. Take now any point b e M"~ which does not neces- 
sarily belong to D. We want to approximate b by points of D in the perspective of using the following 
property: If is convex, ^ p, 7e ^ 7 as e ^ 0, and 7^ G 9^(pe), then 7 S (9^(p). This is useful 
when we only want to find one element of the subdifferential at a given point and we already know the 
gradients at nearby points. 
We use the following approximation: 

be = b + eu, where u = (1, 2, . . . , i, . . . , n — 1). 

We claim that b^ may belong to the complement of D for a finite number of values e at most. Indeed, for 
any pair with i ^ j, the equality hi + ie = bj + je is satisfied for a unique value of e, and for any 
i, i' and s, the same thing holds true for the equality Gi^s = bj' + ei' . Hence, there exists eo > such that 
for e g]0, eoi^ we have b^ ^ D, where the expression of the gradient is fully known by our calculations 
above. 

We can act as follows: Take b and fix i < n — 1. For any s < i, determine which one is minimal among 
Gi^s and bs- In case of equality, priority will be given to Gi^s since in the approximation with b^, the value 
of Gi^s would be smaller than bs + es. This way we classify the indices in two categories: The G-type and 
b-type. Next, look at all the indices si, . . . ,Sk realizing the minimum of Gj ^ V bs- If among si, . . . ,Sk 
there are some which are of the b-type, this would imply that in the approximation with b^, those indices 
will yield even a higher value for Gi^Sj V {bg^ + esj). In particular the maximal one will correspond to the 
largest b-type index since it is the one where the coordinate is increased the most in the approximation. 
Due to the fact that 6* is fixed, we adopt, for i = n, the convention that the index s = n is of the G-type 
when Gn,n A 6* is maximal. Thus, we can define the vector 

W,(6) = 2{{Gs,^,^^bs,J-yi)wl,ie''^ orO, 

where the index is the largest index of b-type such that Gj ^ A 6s is maximal (note that Sj^^ is always 
< n — 1). If no such index exists (i.e. if the maximal ones are all of G-type), then this is the case where 
the vector equals 0. Now consider 

n n—l 

= ^Wi(6) + 2^(6,-Zi)^2,^e\ 

i=l 1=1 

This vector belongs to d'^{b) by approximation and closedness of the subdifferential. 
Note that we would have obtained another element of the subdifferential if we had fixed a different 
order of priority on the coordinates of b; for instance the first index instead of the last one (if u = 
{1,2, . . . ,i, . . . n — 1) was replaced with (n— 1,...,2,1)). We could also have decreased (instead of 
increased) the components, thus giving priority to bs instead of Gi^s in the minimum Gi^s A ftg. In that 
case, we would have obtained for the subgradient of as soon as one of the components realizing the 
maximum was of the G-type. 
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